Flexible Statistical Construction of Bilingual Dictionaries

نویسندگان

  • Ismael Pascual-Nieto
  • Michael O'Donnell
چکیده

Most previous systems for constructing a bilingual dictionary from a parallel corpus have depended on an iterative algorithm, using word translation probabilities to align words in the corpus, and using word alignments to estimate word translation probabilities, and repeating until convergence. While this approach produces reasonable results, it is computationally slow, limiting the size of the corpus that can be analysed and the size of the dictionary produced. We propose a non-iterative approach for producing a uni-directional bilingual dictionary which, while less accurate than iterative approaches, is far quicker, allowing larger corpora to be processed in reasonable time. The approach also allows on-the-fly estimation of translation likelihoods between a pair of terms, meaning that a translation dictionary can be generated with the n most frequent terms in an initial pass, and the translation likelihood of infrequent terms can be calculated as encountered in real documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Chinese Transliteration Word Pair Extraction from Parallel Corpora

Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...

متن کامل

Bilingual Text, Matching using Bilingual Dictionary and Statistics

This paper describes a unified framework for bilingnal text matching by combining existing hand-written bilingual dictionaries and statistical techniques. The process of bilingual text matching consists of two major steps: sentence alignment and structural matching of bilingual sentences. Statistical techniques are apt plied to estimate word correspondences not included in bilingual dictionarie...

متن کامل

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Construction of a Bilingual Dictionary Intermediated by a Third Language

When using a third language to construct a bilingual dictionary, it is necessary to discriminate equivalencies from inappropriate words derived as a result of ambiguity in the third language. We propose a method to treat this by utilizing the structures of dictionaries to measure the nearness of the meanings of words. The resulting dictionary is a word-to-word bilingual dictionary of nouns and ...

متن کامل

Automatic Detection of Multilingual Dictionaries on the Web

This paper presents an approach to query construction to detect multilingual dictionaries for predetermined language combinations on the web, based on the identification of terms which are likely to occur in bilingual dictionaries but not in general web documents. We use eight target languages for our case study, and train our method on pre-identified multilingual dictionaries and the Wikipedia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2007